Target-Aware Lattice Rescoring for Dialect Recognition

نویسندگان

  • Rong Tong
  • Bin Ma
  • Haizhou Li
  • Chng Eng Siong
چکیده

We observed that human listeners distinguish one dialect from another by paying special attention to some particular phonetic and/or phonotactic patterns. Motivated by this observation, we propose a technique that emulates this process. We explore a target-aware lattice rescoring (TALR) process that revises the n-gram statistics in a lattice with target dialect information. We then derive n-gram statistics as the phonotactic features from the lattice and develop a system under the vector space modeling framework. The experiment results show that the proposed technique consistently improves dialect recognition performance on 30-second test utterances. We achieved equal error rates (EERs) of 4.57% and 13.28% with 3-gram statistics for Chinese and English dialect recognition in 2007 NIST Language Recognition Evaluation 30-second closed test sets.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Rescoring-Aware Beam Search for Reduced Search Errors in Contextual Automatic Speech Recognition

Using context in automatic speech recognition allows the recognition system to dynamically task-adapt and bring gains to a broad variety of use-cases. An important mechanism of contextinclusion is on-the-fly rescoring of hypotheses with contextual language model content available only in real-time. In systems where rescoring occurs on the lattice during its construction as part of beam search d...

متن کامل

On-the-fly lattice rescoring for real-time automatic speech recognition

This paper presents a method for rescoring the speech recognition lattices on-the-fly to increase the word accuracy while preserving low latency of a real-time speech recognition system. In large vocabulary speech recognition systems, pruned and/or lower order n-gram language models are often used in the first-pass of the speech decoder due to the computational complexity. The output word latti...

متن کامل

Tone information as a confidence measure for improving Cantonese LVCSR

Cantonese, a syllabically paced, southern Chinese dialect, is also a tonal language. A Cantonese syllable can have up to 9 different tone patterns which are lexically important. In this paper after reviewing major approaches to incorporating tone information into a large vocabulary continuous speech recognition (LVCSR) system, we propose two schemes to employ the tone information as a confidenc...

متن کامل

Fuzzy class rescoring: a part-of-speech language model

Current speech recognition systems usually use word-based trigram language models. More elaborate models are applied to word lattices or N best lists in a rescoring pass following the acoustic decoding process. In this paper we consider techniques for dealing with class-based language models in the lattice rescoring framework of our JANUS large vocabulary speech recognizer. We demonstrate how t...

متن کامل

Phonetic recognition using a statistical hidden dynamic model of speech

This paper presents new results on evaluation of the statistical coarticulatory hidden dynamic model (HDM) on the TIMIT phone recognition task. We train both the HDM and baseline HMM on the complete TIMIT training data set and evaluate both systems using the N-best rescoring algorithm on the TIMIT test data set and the dr8 dialect subset. We show that with the inclusion of the reference transcr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011